On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

نویسندگان

  • Johannes Fischer
  • Dominik Köppl
  • Florian Kurpicz
چکیده

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O ( m p + lg p+ lg lg p · lg lgn ) time for p ≤ m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O ( mσ p max (k, lg lgn)+(1 + m p ) lg p · lg lgn+ occ ) time, where σ is the size of the alphabet and p ≤ σm. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P ′, we present a data structure for computing the interval of PP ′ in O(lg lgn) sequential time, or in O ( 1 + lgp lgn ) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array). 1998 ACM Subject Classification I.1.2 Algorithms

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient de novo assembly of large genomes using compressed data structures - Supplemental Materials and Methods

The suffix array is a compact representation of the lexicographic ordering of the suffixes of a text [1]. Each element of the array is an index into the original string; SAX [i] = j indicates that the suffix starting at position j in T is the i-th lowest suffix in X. As an example consider the string T = AGATCGATA$. The suffix array of T is SAT = [10, 9, 1, 7, 3, 5, 6, 2, 8, 4]. As the suffix a...

متن کامل

Gapped Suffix Arrays: a New Index Structure for Fast Approximate Matching

Approximate searching using an index is an important application in many fields. In this paper we introduce a new data structure called the gapped suffix array for approximate searching in the Hamming distance model. Building on the well known filtration approach for approximate searching, the use of the gapped suffix array can improve search speed by avoiding the merging of position lists.

متن کامل

Ultra-fast Multiple Genome Sequence Matching Using GPU

In this paper, a contrastive evaluation of massively parallel implementations of suffix tree and suffix array to accelerate genome sequence matching are proposed based on Intel Core i7 3770K quad-core and NVIDIA GeForce GTX680 GPU. Besides suffix array only held approximately 20%∼30% of the space relative to suffix tree, the coalesced binary search and tile optimization make suffix array clearl...

متن کامل

Trends in Su x Sorting: A Survey of Low Memory Algorithms

The suffix array is a sorted array of all the suffixes in a string. This remarkably simple data structure is fundamental for string processing and lies at the heart of efficient algorithms for pattern matching, pattern mining, and data compression. In many applications suffix array construction, or equivalently suffix sorting, is a computational bottleneck and so has been the focus of intense r...

متن کامل

Suffix Arrays for Structural Strings

The structural match (s-match), originally addressed by the structural suffix tree, helps identify different RNA sequences with the same secondary structure. In this work, we introduce and construct the structural suffix array and structural longest common prefix array, i.e. lightweight suffix data structures for the s-match. Further, we illustrate how to use our data structures to support addi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016